Semi-supervised and Active Image Clustering with Pairwise Constraints from Humans
نویسنده
چکیده
Title of dissertation: Semi-supervised and Active Image Clustering with Pairwise Constraints from Humans Arijit Biswas, Doctor of Philosophy, 2014 Dissertation directed by: Prof. David W. Jacobs Department of Computer Science University of Maryland, College Park Clustering images has been an interesting problem for computer vision and machine learning researchers for many years. However as the number of categories increases, image clustering becomes extremely hard and is not possible to use for many practical applications. Researchers have proposed several methods that use semi-supervision from humans to improve clustering. Constrained clustering, where users indicate whether an image pair belong to the same category or not, is a well-known paradigm for semisupervision. Past research has shown that pairwise constraints have the potential to significantly improve clustering performance. There are two major components to constrained clustering research: how pairwise constraints can be used to improve clustering (e.g: constrained clustering algorithms, distance or metric learning methods) and determining which constraints are most useful for improving clustering (e.g.: active or interactive clustering methods). In this thesis we propose three different approaches to improve pairwise constrained clustering spanning both of these components. First, we propose a distance learning method in non-vector spaces, where the triangle inequality is used to propagate the pairwise constraints to the unsupervised image pairs. This approach can work with any pairwise distance and does not require any vector representation of images. Second, we propose an algorithm for active image pair selection. A novel method is developed to choose the most useful pairs to show a person, obtaining constraints that improve clustering. Third, we study how pairwise constraints can effectively be used to cluster large image datasets. Complete clustering of large datasets requires an extremely large number of pairwise constraints and may not be feasible in practice. We propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters and can be used for summarization, category discovery, browsing, image search, etc.... Finally, we make use of human input in an active subclustering algorithm to further improve results. We perform experiments on several real world datasets such as faces, leaves, videos and scenes and empirically show that our approaches can advance the state-of-the-art in clustering. Semi-supervised and Active Image Clustering with Pairwise Constraints from Humans
منابع مشابه
Active Learning of constraints using incremental approach in semi-supervised clustering
Semi-supervised clustering aims to improve clustering performance by considering user-provided side information in the form of pairwise constraints. We study the active learning problem of selecting must-link and cannot-link pairwise constraints for semi-supervised clustering. We consider active learning in an iterative framework; each iteration queries are selected based on the current cluster...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملActive semi-supervised fuzzy clustering
Clustering algorithms are increasingly employed for the categorization of image databases, in order to provide users with database overviews and make their access more effective. By including information provided by the user, the categorization process can produce results that come closer to user’s expectations. To make such a semi-supervised categorization approach acceptable for the user, thi...
متن کاملOnline Active Constraint Selection For Semi-Supervised Clustering
Due to strong demand for the ability to enforce top-down structure on clustering results, semi-supervised clustering methods using pairwise constraints as side information have received increasing attention in recent years. However, most current methods are passive in the sense that the side information is provided beforehand and selected randomly. This may lead to the use of constraints that a...
متن کاملFuzzy Clustering with Pairwise Constraints for Knowledge-Driven Image Categorization
The identification of categories in image databases usually relies on clustering algorithms that only exploit the feature-based similarities between images. The addition of semantic information should help improving the results of the categorization process. Pairwise constraints between some images are easy to provide, even when the user has a very incomplete prior knowledge of the image catego...
متن کامل